Estimator Variance in Reinforcement Learning: Theoretical Problems and Practical Solutions

نویسنده

  • Mark D. Pendrith
چکیده

In reinforcement learning as in many on line search techniques a large number of estimation parameters e g Q value estimates for step Q learning are maintained and dynamically updated as in formation comes to hand during the learning process Excessive variance of these estimators can be problematic resulting in uneven or unstable learning or even making e ective learning impossible Estimator variance is usually managed only indirectly by selecting global learning algorithm parameters e g for TD based meth ods that are a compromise between an acceptable level of estimator perturbation and other desirable system attributes such as reduced estimator bias In this paper we argue that this approach may not always be adequate particularly for noisy and non Markovian domains and present a direct approach to managing estimator vari ance the new ccBeta algorithm Empirical results in an autonomous robotics domain are also presented showing improved performance using the ccBeta method

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Reinforcement Learning in Situated Agents: Some Theoretical Problems and Practical Solutions

In on-line reinforcement learning, often a large number of estimation parameters (e.g. Q-value estimates for 1-step Q-learning) are maintained and dynamically updated as information comes to hand during the learning process. Excessive variance of these estimators can be problematic, resulting in uneven or unstable learning, or even making eeective learning impossible. Estimator variance is usua...

متن کامل

Doubly Robust Off-policy Evaluation for Reinforcement Learning

We study the problem of evaluating a policy that is different from the one that generates data. Such a problem, known as off-policy evaluation in reinforcement learning (RL), is encountered whenever one wants to estimate the value of a new solution, based on historical data, before actually deploying it in the real system, which is a critical step of applying RL in most real-world applications....

متن کامل

Doubly Robust Off-policy Value Evaluation for Reinforcement Learning

We study the problem of off-policy value evaluation in reinforcement learning (RL), where one aims to estimate the value of a new policy based on data collected by a different policy. This problem is often a critical step when applying RL to real-world problems. Despite its importance, existing general methods either have uncontrolled bias or suffer high variance. In this work, we extend the do...

متن کامل

Deep Reinforcement Learning

In reinforcement learning (RL), stochastic environments can make learning a policy difficult due to high degrees of variance. As such, variance reduction methods have been investigated in other works, such as advantage estimation and controlvariates estimation. Here, we propose to learn a separate reward estimator to train the value function, to help reduce variance caused by a noisy reward sig...

متن کامل

Multiagent Reinforcement Learning for Multi-Robot Systems: A Survey

Multiagent reinforcement learning for multirobot systems is a challenging issue in both robotics and artificial intelligence. With the ever increasing interests in theoretical researches and practical applications, currently there have been a lot of efforts towards providing some solutions to this challenge. However, there are still many difficulties in scaling up the multiagent reinforcement l...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1997